178 research outputs found

    Distributional Semantic Models for Clinical Text Applied to Health Record Summarization

    Get PDF
    As information systems in the health sector are becoming increasingly computerized, large amounts of care-related information are being stored electronically. In hospitals clinicians continuously document treatment and care given to patients in electronic health record (EHR) systems. Much of the information being documented is in the form of clinical notes, or narratives, containing primarily unstructured free-text information. For each care episode, clinical notes are written on a regular basis, ending with a discharge summary that basically summarizes the care episode. Although EHR systems are helpful for storing and managing such information, there is an unrealized potential in utilizing this information for smarter care assistance, as well as for secondary purposes such as research and education. Advances in clinical language processing are enabling computers to assist clinicians in their interaction with the free-text information documented in EHR systems. This includes assisting in tasks like query-based search, terminology development, knowledge extraction, translation, and summarization. This thesis explores various computerized approaches and methods aimed at enabling automated semantic textual similarity assessment and information extraction based on the free-text information in EHR systems. The focus is placed on the task of (semi-)automated summarization of the clinical notes written during individual care episodes. The overall theme of the presented work is to utilize resource-light approaches and methods, circumventing the need to manually develop knowledge resources or training data. Thus, to enable computational semantic textual similarity assessment, word distribution statistics are derived from large training corpora of clinical free text and stored as vector-based representations referred to as distributional semantic models. Also resource-light methods are explored in the task of performing automatic summarization of clinical freetext information, relying on semantic textual similarity assessment. Novel and experimental methods are presented and evaluated that focus on: a) distributional semantic models trained in an unsupervised manner from statistical information derived from large unannotated clinical free-text corpora; b) representing and computing semantic similarities between linguistic items of different granularity, primarily words, sentences and clinical notes; and c) summarizing clinical free-text information from individual care episodes. Results are evaluated against gold standards that reflect human judgements. The results indicate that the use of distributional semantics is promising as a resource-light approach to automated capturing of semantic textual similarity relations from unannotated clinical text corpora. Here it is important that the semantics correlate with the clinical terminology, and with various semantic similarity assessment tasks. Improvements over classical approaches are achieved when the underlying vector-based representations allow for a broader range of semantic features to be captured and represented. These are either distributed over multiple semantic models trained with different features and training corpora, or use models that store multiple sense-vectors per word. Further, the use of structured meta-level information accompanying care episodes is explored as training features for distributional semantic models, with the aim of capturing semantic relations suitable for care episode-level information retrieval. Results indicate that such models performs well in clinical information retrieval. It is shown that a method called Random Indexing can be modified to construct distributional semantic models that capture multiple sense-vectors for each word in the training corpus. This is done in a way that retains the original training properties of the Random Indexing method, by being incremental, scalable and distributional. Distributional semantic models trained with a framework called Word2vec, which relies on the use of neural networks, outperform those trained using the classic Random Indexing method in several semantic similarity assessment tasks, when training is done using comparable parameters and the same training corpora. Finally, several statistical features in clinical text are explored in terms of their ability to indicate sentence significance in a text summary generated from the clinical notes. This includes the use of distributional semantics to enable case-based similarity assessment, where cases are other care episodes and their “solutions”, i.e., discharge summaries. A type of manual evaluation is performed, where human experts rates the different aspects of the summaries using a evaluation scheme/tool. In addition, the original clinician-written discharge summaries are explored as gold standard for the purpose of automated evaluation. Evaluation shows a high correlation between manual and automated evaluation, suggesting that such a gold standard can function as a proxy for human evaluations. --- This thesis has been published jointly with Norwegian University of Science and Technology, Norway and University of Turku, Finland.This thesis has beenpublished jointly with Norwegian University of Science and Technology, Norway.Siirretty Doriast

    Association between body composition and external load performance in official football matches

    Get PDF
    Hensikt: Hensikten med denne studien var å undersøke sammenhengen mellom kroppssammensetning og fysisks prestasjon i offisielle fotballkamper hos profesjonelle fotballspillere. Metode: 12 profesjonelle mannlige fotballspillere deltok i denne studien. Kroppssammensetning ble målt med to «Dual-energy X-ray absorptiometri»-undersøkelser hvor gjennomsnittet av de to ble brukt i korrelasjonsanalysen. Fra «Dual-energy X-ray absorptiometri» skanningene ble mager-, fett- og fettfri masse variabler ekstrahert for hele kroppen og regionalt i bena. I perioden mellom de to skanningene ble det spilt syv offisielle kamper. Fysisk prestasjonsevne fra kampene ble ekstrahert ved hjelp av en sporingsenhet med «Global Positioning System» og en treghetsmåleenhet. Fysisk prestasjonsevne variable inkluderer total distanse og distanse ved forskjellige hastigheter, makshastighet, høyintensitetshendelser samt et akkumulert belastningsmål, PlayerLoadTM. Resultater: Bayesiansk korrelasjonsanalyse viste at det var moderat bevis og Kendalls Tau-b korrelasjon viste moderat til stor korrelasjon mellom total kroppsfettprosent og sprint-hastighet løpedistanse (BF10 = 4.15; b = -0.52), mellom beinfettmasse og sprint-hastighet løping avstand (BF10 = 3.17; b = -0.49), og mellom beinfettprosent og PlayerLoadTM per minutt (BF10 = 3.31; b = -0.49). Konklusjon: Dataene indikerer at lavere nivåer av fettmasse og fettprosent er faktorer som påvirker fysisk prestasjonsevne i fotball. Resultatene av denne studien gir utøvere og trenere informasjon om sammenhengen mellom kroppssammensetning og fysisk prestasjonsevne i offisielle kamper for profesjonelle fotballspillere

    The breast cancer genome--a key for better oncology.

    Get PDF
    Molecular classification has added important knowledge to breast cancer biology, but has yet to be implemented as a clinical standard. Full sequencing of breast cancer genomes could potentially refine classification and give a more complete picture of the mutational profile of cancer and thus aid therapy decisions. Future treatment guidelines must be based on the knowledge derived from histopathological sub-classification of tumors, but with added information from genomic signatures when properly clinically validated. The objective of this article is to give some background on molecular classification, the potential of next generation sequencing, and to outline how this information could be implemented in the clinic.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

    Distinct patterns in alpine vegetation around dens of the Arctic fox

    Get PDF
    The arctic fox Alopex lugopus excavates its dens in gravely ridges and hillocks, and creates a local environment quite distinct from the surrounding tundra or heath landscape. In northern Sweden, the vegetation of 18 dens of the arctic fox was investigated, as well as reference areas off the dens but in geologically and topographically similar locations. The species composition showed considerable differences between den and reference areas, with grasses and forbs occurring more abundantly on the dens, and evergreen dwarf-shrubs occurring more in reference areas. The effect of the foxes' activities is thought to be either through mechanical soil disturbance, or through nutrient enrichment via scats, urine, and carcasses. This was expected to result in differences in plant traits with key functional roles in resource acquisition and regeneration, when comparing dens with reference areas. We hypothesised that the community mean of specific leaf area (SLA) would differ if nutrient enrichment was the more important effect, and that seed weight, inversely proportional to seed number per ramet and hence dispersal ability, would differ if soil disturbance was the more important effect. Specific leaf area showed a significant difference, indicating nutrient enrichment to be the most important effect of the arctic fox on the vegetation on its dens. Arctic foxes act as ecosystems engineers on a small scale, maintaining niches for relatively short-lived nutrient demanding species on their dens in spite of the dominance of long-lived ericaceous dwarf-shrubs in the landscape matrix. Thus, foxes contribute to the maintenance of species richness on the landscape level

    Care episode retrieval: distributional semantic models for information retrieval in the clinical domain

    Get PDF
    Patients' health related information is stored in electronic health records (EHRs) by health service providers. These records include sequential documentation of care episodes in the form of clinical notes. EHRs are used throughout the health care sector by professionals, administrators and patients, primarily for clinical purposes, but also for secondary purposes such as decision support and research. The vast amounts of information in EHR systems complicate information management and increase the risk of information overload. Therefore, clinicians and researchers need new tools to manage the information stored in the EHRs. A common use case is, given a - possibly unfinished - care episode, to retrieve the most similar care episodes among the records. This paper presents several methods for information retrieval, focusing on care episode retrieval, based on textual similarity, where similarity is measured through domain-specific modelling of the distributional semantics of words. Models include variants of random indexing and the semantic neural network model word2vec. Two novel methods are introduced that utilize the ICD-10 codes attached to care episodes to better induce domain-specificity in the semantic model. We report on experimental evaluation of care episode retrieval that circumvents the lack of human judgements regarding episode relevance. Results suggest that several of the methods proposed outperform a state-of-the art search engine (Lucene) on the retrieval task

    Exploring unsupervised query paraphrasing to identify relevant search phrases for a literature review

    Get PDF
    Literature databases have multifaceted search options, but emerging research areas do not have an established terminology and therefore it is difficult to find relevant literature when conducting a review. This study aimed to explore if an unsupervised paraphrasing approach is useful in identifying relevant search phrases for a literature review on an emerging research topic – situational leadership in critical care. Using an initial set of 12 search phrases, the system was used to propose additional phrases, which were manually classified and further used in an expanded PubMed database search. Finally, we assessed the papers found with the expanded search and compared this to the initial search results. As a result, the expanded search more than tripled the search results, from 182 to 673 papers. The expanded search also more than tripled the number of relevant papers, from 12 in the original search to 39 in the expanded search.</p

    Identifying nursing sensitive indicators from electronic health records in acute cardiac care―Towards intelligent automated assessment of care quality

    Get PDF
    Aim: The aim of this study is to explore the potential of using electronic health records for assessment of nursing care quality through nursing-sensitive indicators in acute cardiac care. Background: Nursing care quality is a multifaceted phenomenon, making a holistic assessment of it difficult. Quality assessment systems in acute cardiac care units could benefit from big data-based solutions that automatically extract and help interpret data from electronic health records. Methods: This is a deductive descriptive study that followed the theory of value-added analysis. A random sample from electronic health records of 230 patients was analysed for selected indicators. The data included documentation in structured and free-text format. Results: One thousand six hundred seventy-six expressions were extracted and divided into (1) established and (2) unestablished expressions, providing positive, neutral and negative descriptions related to care quality. Conclusions: Electronic health records provide a potential source of information for information systems to support assessment of care quality. More research is warranted to develop, test and evaluate the effectiveness of such tools in practice. Implications for Nursing Management Knowledge-based health care management would benefit from the development and implementation of advanced information systems, which use continuously generated already available real-time big data for improved data access and interpretation to better support nursing management in quality assessment.</p

    Combining supervised and unsupervised named entity recognition to detect psychosocial risk factors in occupational health checks

    Get PDF
    Introduction: In occupational health checks the information about psychosocial risk factors, which influence work ability, is documented in free text. Early detection of psychosocial risk factors helps occupational health care to choose the right and targeted interventions to maintain work capacity. In this study the aim was to evaluate if we can automate the recognition of these psychosocial risk factors in occupational health check electronic records with natural language processing (NLP). Materials and methods: We compared supervised and unsupervised named entity recognition (NER) to detect psychosocial risk factors from health checks’ documentation. Occupational health nurses have done these records. Results: Both methods found over 60% of psychosocial risk factors from the records. However, the combination of BERT-NER (supervised NER) and QExp (query expansion/paraphrasing) seems to be more suitable. In both methods the most (correct) risk factors were found in the work environment and equipment category. Conclusion: This study showed that it was possible to detect risk factors automatically from free-text documentation of health checks. It is possible to develop a text mining tool to automate the detection of psychosocial risk factors at an early stage</p

    How GPs can Recognize Persistent Frequent Attenders at Finnish Primary Health Care Using Electronic Patient Records

    Get PDF
    Introduction: The proportion of patients who are frequent attenders (FAs) varies from few percent to almost 30% of all patients. A small group of patients continued to visit GPs year after year. In previous studies, it has been reported that over 15% of all 1-year FAs were persistent frequent attenders (pFAs). Objectives: This study aimed to identify typical features of pFAs from the textual content in their medical entries, which could help GPs to recognize pFAs easily and facilitated treatment.Methods: A retrospective register study was done, using 10 years of electronic patient records. The data were collected from Finnish primary health care centers and used to analyze chronic symptoms and diagnoses of pFAs and to calculate the inverse document frequency weight (IDF) of words used in the patient records. IDF was used to determine which words, if any, are typical for pFAs. The study group consisted of the 5-year pFAs and control group of 1-year FAs. The main background variables were age, gender, occupation, smoking habits, use of alcohol, and BMI. Results: Out of 4392 frequent attenders, 6.6% were pFAs for 3 years and 1.1% were pFAs for 5 years. Of the pFAs, 65% were female and 35% were male. The study group had significantly more depressive episodes (P =.004), heart failure (P =.019), asthma (P =.032), COPD (P =.036), epilepsy (P =.035), and lumbago (P =.046) compared to the control group. GPs described their 5-year pFAs by words related to lung and breathing issues, but there was no statistical difference to the 1-year FAs’ descriptions. Conclusion: A typical pFA seems to be a woman, aged about 55 years with depressive episodes, asthma or COPD, and lower back pain. Physicians describe pFAs with ordinary words in patient records. It was not possible to differentiate pFAs from 1-year FAs in this way. © The Author(s) 2021.Author keywordsdescribing persistent frequent attenders; electronic patient entry; persistent frequent attender; practice management; primary care</p
    corecore